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SYNOPSIS 


The  main  purpose  of  this  study  was  to  discover  the  value ,  as  diagnostic 
instruments,  of  certain  group  silent  reading  tests  at  the  elementary  school 
level.  The  tests  were  administered  to  one  hundred  sixty  subjects  and  data 
compiled  for  both  raw  scores  and  grade  scores  for  the  rate,  vocabulary,  and 
comprehension  sub-tests.  Non-statlstical  criteria  for  evaluating  the  diag¬ 
nostic  values  of  these  tests  were  selected  from  related  studies. 

The  first  statistical  criterion  for  the  use  of  the  tests  for  diagnostic 
purposes  was  the  degree  of  normalcy  exhibited  by  the  distributions  of  raw 
scores  yielded  by  the  tests.  A  second  criterion  was  based  on  the  transmuted 
grade  scores.  The  mean  was  computed  for  each  test  and  the  average  of  these 
means  was  taken  as  a  criterion  score.  The  tests  were  subsequently  rated 
according  to  their  abilities  to  yield  grade  scores  which  approximated  this 
general  consensus.  Finally  the  non- statistical  criteria  were  applied  and 
the  findings  of  both  the  statistical  and  the  non- statistical  evaluations  were 
summarized. 

For  the  purpose  of  diagnosing  vocabulary  and  comprehension  the  untimed 
tests  which  were  widest  in  scope  appeared  to  be  superior  to  the  timed  tests. 
Rate  tests  which  provided  constant  checks  on  comprehension  and  which  continued 
for  a  period  of  five  minutes  or  more  proved  to  be  superior  to  those  employing 
one  minute  intervals  with  no  checks  on  comprehension. 

Recommendations  for  the  use  of  the  tests  as  diagnostic  instruments  were 
also  made. 
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CHAPTER  I 


THE  PROBLEM 

BACKGROUND  OF  THE  PROBLEM 

The  need  for  research  on  the  use  of  standardized  tests  as  diagnostic 
instruments  became  apparent  to  the  investigator  during  the  course  of  the  1950 
reading  clinic  at  the  Summer  Session  of  the  University  of  Alberta.  This  is 
further  borne  out  by  Helen  Robinson^  who  concluded  that  additional  research 
was  required  into  problems  relating  to  reading,  more  particularly  the  develop¬ 
ment  of  better  diagnostic  instruments.  A  review  of  such  publications  as  the 

2  3 

Journal  of  Educational  Research  ,  the  Journal  of  Educational  Psychology  ,  and 

4 

the  Elementary  School  Journal  ,  reveals  an  intensified  interest  in  reading 

among  American  educationalists.  For  example,  the  February,  1950,  issue  of 

the  Journal  of  Educational  Research  was  devoted  entirely  to  articles  on 

reading,  but  there  was  insufficient  space  for  all  contributions  and  some  had 

5 

to  be  carried  over  to  the  March  issue.  In  his  1949  summary  Gray  listed 
ninety-two  scientific  studies  relating  to  reading. 

Reading  tests  of  various  kinds  have  long  been  an  accepted  part  of  a  good 
elementary  program  of  reading  instruction  but  modern  emphasis  on  diagnostic 
procedures  has  brought  marked  changes  in  the  use  of  tests.  Older  reading  tests, 
though  sometimes  titled  "diagnostic”,  do  not  provide  profile  charts  nor  do  the 

^Helen  Robinson,  Why  Children  Fail  in  Reading,  1946,  p.  233. 

^Journal  of  Educational  Research,  1927  to  1950. 

^Journal  of  Educational  Psychology,  1927  to  1950. 

4 The  Elementary  School  Journal,  1927  to  1950. 

5W.  S  Gray,  "Summary  of  Reading  Investigations",  July  1,  1948  to  June  30, 
1949,  Journal  of  Educational  Research,  V.  53,  No.  6,  Feb.  1950,  p.p.  401-39. 
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manuals  accompanying  them  offer  suggestions  for  the  use  of  the  tests  in 
diagnosing  reading  problems  of  pupils  making  unsatisfactory  scores  on  any 
or  all  of  their  parts.  On  the  other  hand,  such  more  recently  devised  tests 
as  the  Iowa  Silent  Reading  Tests  and  the  Diagnostic  Examination  by  Van  Wagenen 
and  Dvorak  do  provide  these  essentials. 

Many  excellent  tests  and  procedures  have  been  devised  by  such  experts  as 
Dr.  E.  A.  Betts ^  and  W.  Kottmeyer* 7 *  for  use  in  a  clinical  situation.  In 
addition,  several  outstanding  individual  diagnostic  reading  tests  have  been 
developed.  Of  particular  interest  among  the  more  recent  of  these  are  the 
tests  devised  by  Gates®  and  Durr ell 9 ,  Helen  Robinson1®  has  reported  success 
in  getting  at  reading  problems  when  individual  diagnostic  procedures  were 
followed,  but  the  classroom  teacher  in  Alberta  seems  to  find  a  need  for  less 
time-consuming  procedures  and  often  has  not  the  resources  to  deal  separately 
with  each  pupil.  Therefore,  the  use  of  group  procedures  and  group  tests  is 
of  paramount  importance. 

Harris11  states  that  when  conducting  a  diagnosis  of  a  child’s  reading 
difficulties,  the  two  most  important  questions  to  be  answered  are,  how  diffi¬ 
cult  a  book  can  this  child  read?  and,  how  does  the  pupil  read?  Since  the  latter 
question  calls  for  an  investigation  of  the  particular  reading  skills  the  child 
has  developed,  an  attempt  will  be  made  to  determine  the  extent  to  which  these 
questions  can  be  answered  by  group  tests.  Dolch  says,  "a  test  should  test  some 
specific  thing  which  can  be  taught,  and  which  should  be  taught  in  a  particular 

®E.  A.  Betts,  Foundations  of  Reading  Instruction,  Ch.  XXI. 

7W.  Kottmeyer,  Handbook  for  Remedial  Reading,  Gh.  6. 

®A.  I.  Gates,  Gates  Reading  Diagnostic  Tests,  Kit. 

9D,  Durrell,  Analysis  of  Reading  Difficulties,  Kit. 

10Helen  Robinson,  Op.  Cit Ch.  VII. 

J.  Harris,  How  to  Increase  Reading  Ability,  p.73. 
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grade  or  range  of  grades."^  The  things  which  reading  tests  measure  and  the 
use  which  may  he  made  of  the  obtained  measures  bear  further  investigation. 
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STATEMENT  OF  THE  PROBLEM 

The  main  purpose  of  this  study  is  to  discover  the  diagnostic  value  of 
certain  group  silent  reading  tests  at  the  elementary  school  level.  The  investi¬ 
gator  will  attempt  to  determine  the  extent  to  which  these  tests  substantiate 
or  contradict  one  another,  the  reading  deficiencies  revealed  by  each  test,  the 
value  of  each  to  the  classroom  teacher  as  an  instrument  of  diagnosis,  and  the 
place  of  each  in  a  diagnostic  program. 

A  diagnostic  test  of  silent  reading  ability  is  generally  regarded  as  one 
which  provides  accurate  information  on  how  a  subject's  achievement  in  such 
basic  abilities  as  rate  of  reading,  vocabulary,  and  comprehension  compare,  and 
which  also  provides  scores  that  enable  a  teacher  to  determine  how  each  testee 
stands  in  a  number  of  sub-skills  in  relation  to  his  general  reading  level. 
Moreover,  each  sub-test  should  be  of  sufficient  length  and  should  possess 
enough  variety  to  test  adequately  the  level  of  achievement  of  each  pupil  in 
the  particular  aspect  of  silent  reading  that  it  purports  to  measure. 

Teachers  are  frequently  faced  with  the  problem  of  analysing  a  pupil's 
silent  reading  abilities  on  the  basis  of  scores  made  on  standardized  tests. 

This  study  seeks  to  explore  the  value  of  such  tests  in  acquiring  knowledge 
about  a  pupil  to  the  end  that  his  development  may  be  more  effectively  guided. 
Therefore,  the  emphasis  does  not  center  on  the  disabled  reader,  but  is  on  the 
efficiency  with  which  the  tests  provide  knowledge  about  the  reading  abilities 

12E,  W.  Dolch,  Problems  in  Reading,  p.2QS. 
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possessed  by  each  member  of  the  group. 

More  specifically  this  study  seeks  to  evaluate  the  tests  in  terms  of  their 
capacities  in  the  following  respects : 

(a)  their  ability  to  yield  normal  distributions  of  raw  scores; 

(b)  their  ability  to  yield  grade  placement  scores  which  correspond  to 
the  average  or  consensus  of  the  tests; 

(c)  their  ability  to  report  accurately  a  subject’s  reading  level  in 
each  of  rate,  vocabulary,  and  comprehension. 

RELATED  STUDIES 

Although  an  increasing  number  of  studies  are  being  done  on  problems  relat¬ 
ing  to  the  diagnosis  of  reading  difficulties,  few  have  been  attempted  which 
employ  group  silent  reading  tests  as  instruments  of  diagnosis.  However,  a 
number  of  investigations  have  been  conducted  to  evaluate  the  methods  used  in 
tests  designed  to  measure  various  reading  abilities.  The  first  group  of  reviews 
summarized  in  this  chapter  deals  with  the  problem  of  measuring  general  reading 
ability,  while  succeeding  ones  are  concerned  more  particularly  with  problems 
involved  in  measuring  rate,  vocabulary,  and  comprehension. 

(a)  General  Problems.  In  a  study  entitled,  ’’Problems  of  Measurement  of 
Reading  Ability",  Traxler^3  asserts  that  reading  is  one  of  the  most  difficult 
of  all  abilities  to  measure  accurately.  He  maintains  that  the  problems of  meas¬ 
uring  the  reading  ability  of  school  children  objectively  by  means  of  group  tests 
are  mainly  due  to  the  intricate  nature  of  the  reading  process,  but  lists  the 
following  as  contributing  factors:  (l)  there  is  lack  of  agreement  among  special¬ 
ists  in  this  field  concerning  what  reading  is;  (2)  experts  disagree  on  what 

-^A.  E.  Traxler,  "Problems  of  Measurement  of  Reading  Ability",  The  School 
Review,  V.  52,  No.  8,  Oct.  1944,  p.p.  491-95. 
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reading  comprehension  includes;  (3)  because  many  of  our  words  have  several 
meanings  it  is  not  possible  to  state  from  a  single  correct  response  that  a 
pupil  knows  all  the  meanings  of  a  word;  (4)  no  one  has  yet  devised  an  entire¬ 
ly  satisfactory  means  of  measuring  rate  of  reading  which  also  provides  a  check 
on  comprehension. 

Burkart**-4  surveyed  the  literature  dealing  with  the  analysis  of  reading 
ability  and  drew  up  a  list  of  two  hundred  fourteen  skills  and  abilities  involved 
in  reading.  With  the  help  of  five  judges  she  reduced  the  list  to  eighty -nine 
separate  items.  These  items  were  then  classified  under  the  six  general  headings 
of:  observation,  research  abilities,  vocabulary  abilities,  oral  reading  abili¬ 
ties,  hygienic  abilities,  and  aesthetic  abilities.  These  were  submitted  to  a 
group  of  reading  specialists  with  the  request  that  they  rate  each  ability  as 
highly  important,  important,  or  unimportant.  The  items  were  then  classified 
according  to  the  preference  indicated  by  the  forty  experts  who  replied  to  the 
questionnaire.  The  ability  to  be  attentive  while  reading  was  listed  first  with 
comprehension,  rate,  and  vocabulary,  the  three  basic  abilities  selected  for  this 
study,  following  closely  in  the  order  stated. 

The  two  major  conclusions  drawn  from  this  study  were:  (1)  that  reading  is 
a  complex  activity  made  up  of  motor,  sensory,  and  intellectual  abilities; 

(2)  that  educators  regard  the  motor  and  sensory  aspects  of  reading  of  less 
importance  than  the  mental  or  intellectual  aspects. 

A  study  of  invalidities  of  general  reading  tests  was  made  by  J.  H.  Shores^. 
Toward  the  goal  of  higher  validity  for  reading  tests  he  suggested:  (1)  that  we 
give  serious  thought  and  experimentation  toward  an  evaluation  of  the  concept  of 

14K.  H.  Burkart,  "An  Analysis  of  Reading  Abilities",  Journal  of  Educational 
Research,  V.  58,  No.  6,  Feb.  1945,  p.p.  430-39. 

15J.  H.  Shores,  "Some  Considerations  of  Invalidities  of  General  Reading 
Tests",  Journal  of  Educational  Research,  V.  40,  No.  6,  Feb.  1947,  p.p.  448-57. 
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general  reading  ability,  for  we  may  be  trying  to  measure  something  which  does 
not  exist  as  such;  (2)  that  both  depth  and  breadth  of  comprehension  be  meas¬ 
ured  in  the  same  test  with  separate  scores  for  each;  (3)  that  reading  rate 
and  comprehensions  are  inseparable  and  should  be  measured  with  the  same  materials, 
and  further  that  the  time  taken  to  answer  the  questions  on  comprehension  should 
not  be  included  in  the  rate  measure;  (4)  that  reading  tests  used  in  the 
intermediate  grades  or  above  should  employ  materials  chosen  from  a  variety  of 
content  areas,  and  if  possible,  that  these  materials  be  chosen  from  several 
authors  within  each  content  area;  (5)  that  an  attempt  be  made  to  give  the 
interest  factor  consideration  by  either  attempting  to  develop,  prior  to  the  test, 
interest  in  the  content  area  of  the  test  selections,  or  by  consciously  selecting 
materials  from  a  variety  of  interests  in  the  hope  that  the  undesirable  effects 
of  the  interest  factor  will  cancel  themselves;  (6)  that  the  test  directions 
clarify  the  motivation  factor  by  attempting  to  motivate  along  certain  precon¬ 
ceived  lines;  (7)  that  the  test  clearly  set  the  reader’s  purpose  for  each 
passage  prior  to  the  reading  of  that  passage;  (8)  that  the  assumption  of  a 
reasonably  equal  opportunity  for  the  experience  background  required  by  the 
testing  materials  be  clearly  made  and  that  the  testing  materials  be  selected 
with  this  assumption  in  mind. 

That  comparisons  and  evaluations  of  reading  tests  may  make  valuable  contri¬ 
butions  to  educational  research,  is  amply  illustrated  by  George  Spaehe^ ,  who 
made  a  comparison  of  certain  oral  reading  tests.  Using  data  from  his  file  of 
case  studies  he  evolved  a  number  of  pertinent  suggestions  for  administering 
Durrell's  Analysis  of  Reading  Difficulty  and  Gray’s  Oral  Check  Tests.  He  gives 

17G.  Spache,  ”A  Comparison  of  Certain  Oral  Reading  Tests”,  Journal  of 
Educational  Research,  V.  43,  No.  6,  Feb.  1950,  p.p.  441-52. 
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practical  suggestions  for  modifying  the  administration  of  these  tests  to  pupils 
known  to  be  disabled  in  reading  or  suspected  of  suffering  a  reading  handicap. 

He  found  that  Gray's  tests  were  too  advanced  for  use  with  retarded  readers  and 
suggested  modifications  for  administering  them.  When  testing  handicapped 
children  with  Burrell's  tests,  Spache  recommended  that  the  beginning  point 
should  be  one  grade  below  the  pupil's  estimated  grade  level  instead  of  at  that 
level  as  directed  in  the  manual.  Finally  he  concluded  that  both  tests  yield 
fairly  accurate  estimates  of  a  child's  actual  reading  level  when  administered 
in  accordance  with  his  recommendations. 

Gates^-8  gave  a  battery  of  twenty-eight  specific  tests  to  each  of  thirty- 
two  third  grade  children  for  the  purpose  of  securing  data  to  be  used  in  improv¬ 
ing  the  Gates  Reading  Diagnostic  Tests .  The  findings  justified  the  conclusion 
that  the  data  secured  through  the  use  of  diagnostic  tests  enabled  the  examiner, 
with  some  degree  of  success,  to  size  up  various  processes  or  activities  or 
operations  and  to  relate  them  to  other  component  abilities  and  to  reading 
achievement  in  general.  For  example,  the  diagnostic  tests  would  seem  to  be 
useful  for  determining  whether  the  pupil's  difficulty  in  word  recognition  is 
due  to  deficiencies  in  ability  to  blend  letter  sounds  or  to  any  one  of  several 
other  deficiencies.  The  range  of  correlations  between  general  silent  reading 
ability  and  specialized  reading  tests  was  from  .30  up  to  .82. 

(b)  Rate.  Paul  Blommers  and  E.  F.  Lindquist19  surveyed  the  literature  on 
the  relationships  between  reading  rate  and  comprehension  and  concluded  that  the 
great  differences  obtained  by  investigators  resulted  from  the  use  of  measuring 

18A.  I.  Gates,  "A  Correlation  Study  of  a  Battery  of  Reading  Diagnostic 
Tests",  Journal  of  Educational  Research,  V.  40,  No.  6,  Feb.  1947,  p.p.  436-47. 

19P.  Blommers  and  E.  Lindquist,  "Rate  of  Comprehension  of  Reading",  Journal 
of  Educational  Psychology,  V.  35,  No.  8,  Nov.  1944,  p.p.  449-72. 
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instruments  not  specifically  designed  for  studying  this  relationship.  Therefore 
they  proceeded  to  develop  a  test  for  measuring  rate  of  comprehension  of  reading, 
and  employed  it  to  study  the  relationship  between  rate  of  comprehension  and 
power  of  comprehension.  The  unique  characteristics  of  the  test  developed  were: 
(1)  the  control  of  reading  purpose;  (2)  the  use  of  only  the  time  spent  in 
comprehending  at  a  certain  level  in  deriving  the  rate  score;  and  (3)  the  tech¬ 
nique  of  combining  a  set  of  relative  rate  scores  to  obtain  a  composite  rate 
score . 

The  test  was  administered  to  six  hundred  seventy-two  eleventh  and  twelfth 
grade  pupils  in  four  middle-sized  Iowa  high  schools.  The  principal  findings 
were  as  follows:  (1)  the  relationship  between  rate  of  reading  comprehension 
and  power  of  reading  comprehension  is  significant  but  low,  the  correlation 
(within -grades  within-schools )  being  approximately  .30;  (2)  an  individual 

tends  to  maintain  approximately  the  same  rank  in  rate  of  successful  reading  in 
a  given  group  despite  differences  such  as  are  usually  found  in  the  difficulty 
and  nature  of  the  individual  selections  used  in  reading  comprehension  tests; 

(3)  good  comprehenders  adjust  their  rate  of  reading  by  slowing  down  as  the 
material  increases  in  difficulty,  whereas  poor  comprehenders  apparently  read 
easy  and  difficult  materials  at  much  the  same  rate;  (4)  significant  differ¬ 
ences  were  found  between  measures  of  reading  rate  based  only  upon  materials 
comprehended  and  measures  of  reading  rate  based  upon  materials  not  comprehended 
or  upon  a  mixture  of  comprehended  and  uncomprehended  materials.  No  relationship 
was  found  between  rate  based  on  uncomprehended  materials  and  power  of  reading 
comprehension;  (5)  when  the  experimental  reading  rate  test  here  described  was 
used  as  a  criterion,  the  validities  of  certain  existing  rate  tests  were  low; 

(6)  rate  scores  which  in  part  measure  comprehension  are  poor  measures  of  read¬ 
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(e)  Vocabulary .  In  1942  Croribach20  reviewed  the  literature  on  diagnostic 
vocabulary  testing  and  listed  the  types  of  word  knowledge  usually  tested  for  as 
definition  of  words,  application  of  the  term,  breadth  of  meaning,  precision  of 
meaning,  and  the  ability  to  use  a  word  in  discourse.  Most  studies  avoided 
diagnostic  testing  for  breadth  and  precision  because  of  interest  in  estimating 
the  size  of  a  pupil's  vocabulary.  Too  often,  he  says,  a  testee  can  obtain 
credit  without  actually  knowing  a  word.  After  discussing  other  weaknesses  of 
vocabulary  tests,  he  concluded  that  there  was  a  need  for  more  valid  tests, 
especially  with  respect  to  breadth  and  precision  of  meaning. 

(d)  Comprehension .  A  study  of  the  factors  involved  in  reading  comprehen¬ 
sion  was  conducted  by  Frederick  B.  Davisgl.  According  to  Davis,  the  factors 
measured  in  tests  of  reading  comprehension  are,  first,  two  general  factors: 

(1)  word  meaning,  and  (2)  reasoning  in  reading,  which  involves  facility  in 
weaving  together  several  ideas  and  showing  their  relationships,  and  ability  to 
draw  correct  inferences  from  the  writer's  statements.  Second,  the  specific 
factors  analyzed  by  Davis  were  (1)  ability  to  determine  the  writer's  purpose, 
intent  or  point  of  view;  (2)  ability  to  understand  the  writer's  explicit 
statements  or  to  get  the  literal  meaning;  (3)  ability  to  follow  the  organiza¬ 
tion  of  a  passage  and  to  identify  antecedents  and  references  in  it;  (4)  ability 
to  select  the  main  thought  of  a  passage;  (5)  ability  to  determine  from  context 
the  meaning  of  an  unfamiliar  word  or  to  select  an  appropriate  meaning,  and 
(6)  ability  to  determine  the  tone  and  mood  implicit  or  explicit  in  a  passage. 
The  analysis  of  Davis  has  been  challenged  by  L.  L.  Thurstone22  who  uses  a 

20L.  J.  Croribach,  "Diagnostic  Vocabulary  Testing",  Journal  of  Educational 
Research,  V.  36,  No.  3,  Nov.  1942,  p.p.  206-18. 

B.  Davis,  "What  Do  Reading  Tests  Really  Measure" ,  Summarized  in 
Journal  of  Educational  Research,  V.  40,  No.  5,  Jan.  1947,  p.p.  391-2. 

S^lbid.  p.  392. 
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different  method  of  factor  analysis  and  who  claims  to  have  found  one  general 
factor  in  reading  comprehension  rather  than  the  numerous  factors  discovered  by 
Davis.  The  contradiction  in  the  findings  of  these  two  investigators  indicates 
the  need  for  further  experimentation  and  factor  analysis  of  the  components  of 
reading  comprehension. 
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CHAPTER  II 


EXPERIMENTAL  DESIGN 

THE  DESIGN  OF  THE  EXPERIMENT 

In  this  study  it  is  proposed  to  give  five  standardized  silent  reading 
tests  to  each  of  a  group  of  two  hundred  subjects  from  grades  four,  five,  and 
six  in  the  Delton  and  Fairview  Schools  in  Edmonton,  Alberta,  in  an  effort  to 
determine  the  diagnostic  values  of  each  test*  In  this  connection  a  statis¬ 
tical  analysis  of  the  raw  scores  will  be  made.  A  rational  and  statistical 
analysis  of  the  transmuted  grade  scores  for  each  of  the  rate,  vocabulary,  and 
comprehension  sub-tests  will  also  be  completed.  An  evaluation  of  the  diagnos¬ 
tic  capacities  of  the  tests  will  be  carried  out  through  the  application  of 
criteria  selected  from  related  studies.  These  procedures  are  discussed  in 
more  detail  in  the  succeeding  sections  of  this  chapter. 

THE  TESTS 

The  five  silent  reading  tests*-  to  be  administered  for  the  purpose  of 
securing  the  data  for  this  study  are  listed  as  follows; 

1.  Diagnostic  Reading  Test  by  S.  L.  and  L.  C.  Pressey,  Grades  3  to  9, 
Form  A,  1927. 

2.  Gates  Reading  Survey  by  A.  I.  Gates,  Grades  3  (2nd  half)  to  10, 

Form  1,  1942. 

3.  Sangren -Woody  Reading  Test  by  P.  V.  Sangren  and  C.  Woody,  Grades  4 
to  8,  Form  A,  1928. 


*See  page  44  of  this  study. 
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4.  Iowa  Silent  Reading  Tests  by  H.  A.  Greene  and  V.  N.  Kelley, 

Grades  4  to  8,  Elementary  Form  A  (revised),  1939. 

5.  Diagnostic  Examination  of  Silent  Reading  Abilities  by  J.  Van  Wagenen 
and  A.  Dvorak,  Grades  4  and  5,  Intermediate,  Grades  6  to  8,  Junior, 

Form  M,  1939. 

Kottmeyer2  states  that  a  diagnostic  test  differs  from  a  survey  test  in  that 
it  gives  separate  scores  for  each  of  a  number  of  reading  abilities  while  a  sur¬ 
vey  test  gives  but  a  single  total  score.  Each  of  the  five  tests  selected  for 
this  study  meets  this  criterion.  However,  the  tests  by  FTessey  and  by  Gates 
give  only  the  three  basic  scores  plus  a  total  score  and  do  not  provide  individ¬ 
ual  profile  charts.  They  are  included  in  the  group  in  order  that  a  comparison 
of  the  diagnostic  value  of  such  tests  may  be  made  with  that  of  tests  which  are 
wider  in  scope.  Both  the  Sangren-Woody  and  the  Iowa  Silent  Reading  tests  are 
much  wider  in  scope  than  those  by  Bressey  and  Gates,  but  are  more  limited  in 
depth  in  that  the  parts  are  shorter  and  specific  time  limits  are  set  for  each. 
These  two  tests,  while  providing  a  greater  number  of  scores,  can  be  administered 
in  only  27  and  45  minutes  respectively.  The  Diagnostic  Examination  by  Van 
Waganen  and  Dvorak,  on  the  other  hand,  requires  about  2  hours  to  administer  and, 
except  for  the  rate  sub-test.  Imposes  no  specific  time  limits.  Because  of  its 
length  it  is  divided  into  three  parts  and  may  be  administered  in  several  sittings 
so  long  as  ample  time  is  provided  for  every  testee  to  finish  as  much  as  he  or 
she  can. 

Table  I3  illustrates  the  scope  of  each  test  as  well  as  the  variation  among 
its  component  parts.  The  tests  are  arranged  in  order  with  those  having  the 

2W.  Kottmeyer,  Handbook  for  Remedial  Reading,  p.  79. 

3See  page  13  of  this  study. 
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fewest  sub-tests  first.  It  will  also  be  noted  that  although  each  test  gives 
scores  for  rate  and  vocabulary  the  term  comprehension  is  not  used  to  designate 
part  of  the  Pressey  or  Sangren-Woody  tests.  On  the  Pressey  test  the  score  for 
paragraph  meaning  was  used  for  comprehension,  while  on  the  Sangren-Woody  and 
Iowa  tests  the  total  of  the  five  bracketed  parts  was  taken  for  this  measure. 


Similarly  the  comprehension  score  on  the  Van  Wagenen -Dvorak  tests  is  a  composite 
of  the  parts  indicated  by  the  bracket.  The  variety  exhibited  among  the  com¬ 
ponent  parts  illustrates  the  difficulty  of  preparing  a  test  to  measure  compre¬ 
hension. 


TABLE  I.  SCORES  GIVEN  BY  THE  TESTS 


Pressey 

Gates 

Sangren- 

Woody 

Iowa 

Van  Wagenen- 
Dvorak 

Rate 

Rate 

Rate 

Rate 

Rate 

Vocabulary 

Vocabulary 

Vocabulary 

Vocabulary 

Vocabulary 

Paragraph 

Meaning 

Comprehension 

1 

1 

1 

[Total 
f  Meaning 

(Comprehens  ion 

Comprehension 

i 

Total  Score 

Total  Score 

\ 

(Fact 

[  Material 

(Directed 
(  Reading 

I 

Perception  of 
Relations 

1 

1 

^Central 
{  Thought 

(Sentence 
(  Meaning 

Range  of 
Information 

1 

\ 

[Following 
f  Directions 

(Paragraph 
(  Comprehension 

(Central 
(  Thought 

i 

^Organization 

(Alphabetizing 

(Clearly  Stated 

(  Details 
l 

Total  Score 

Use  of  Index 

(Interpretation 

Median  Score 

(Integration  ofj 
(  Dispersed 
(  Ideas 

(Drawing 
(  Inferences 

Reading  Index 
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Table  II  shows  the  number  of  items  in  each  test  for  the  three  basic  sub¬ 
tests.  Time  limits  for  rate  are  included  in  order  to  facilitate  comparison. 

It  will  be  noted  that  the  Sangren-Woody  and  Iowa  tests  provide  no  tasks  for 
the  reader  to  perform  as  a  check  on  comprehension  of  the  sections  read  to  give 
a  rate  of  reading  score.  These  two  also  employ  one -minute  intervals  to  time 
the  rate  tests.  The  effect  of  these  and  other  differences  in  the  composition 
of  the  tests  will,  it  is  hoped,  become  apparent  once  the  tests  have  been  ad¬ 
ministered  and  the  data  have  been  compiled. 


TABLE  II.  COMPARISON  OF  NUMBERS  OF  ITEMS  IN  EACH  TEST  FOR 
THE  THREE  BASIC  SUB-TESTS. 


Rate 

Vocab.  1 

Comprehension 

No.  of; 

! 

Test 

No.  of 
Words 

;  NO.  Of  S 

Tasks  | 

Time 

No .  of  : 
Items  [ 

No.  of 
Paragraphs  1 

No.  of  | 
Tasks  ] 

Other  j 
i  Scores! 

Pressey 

3300 

226  : 

25  min. 

100 

60  short 

60 

1  1 

I  ■ 

Gates 

1800 

: 

64 

10  min. 

85  ? 

35  short  i 

83 

1 

Sangren- 

Woody 

425  ! 

j 

|  o 

1  min. 

I 

|  40 

44  short 

66 

[ 

6 

Iowa 

860 

:  0  ! 

: 

1  min. 
2  tests 

55 

40  long 

!  90 

7 

Van  Wagenen- 
Dvorak 

1680 

- 

56 

. . . 

5  min. 

1  ”  1 

23  long 

100 

8 
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THE  SUBJECTS 

Of  the  two  hundred  pupils  selected  for  this  study  approximately  one  hundred 
sixty  were  from  the  Delton  school.  The  remaining  forty  pupils  were  from  the 
Fairview  school  which  serves  a  more  outlying  area  where  the  socio-economic 
status  of  the  population  is  somewhat  lower  and  where  the  reading  stimulus  from 
hooks  at  home  is  assumed  to  he  correspondingly  weaker.  Although  children  of 
Anglo-Saxon  descent  are  in  the  majority,  both  groups  contain  a  large  number  of 
children  of  foreign  extraction,  particularly  Ukrainian.  Because  the  two  schools 
serve  contiguous  districts  the  investigator,  who  is  principal  of  Fairview  school, 
was  able  to  supervise  the  administration  of  all  tests,  thus  assuring  a  high 
degree  of  uniformity  in  this  regard.  Since  the  primary  purpose  of  this  study 
is  to  determine  the  diagnostic  value  of  certain  group  silent  reading  tests,  not 
to  challenge  their  validity  nor  to  prepare  norms  for  local  use,  considerations 
of  socio-economic  status,  intelligence  and  other  population  variables,  are  not 
considered  to  be  of  major  concern.  Consequently  it  is  an  advantage  in  this  type 
of  study  to  have  a  group  of  pupils  in  whom  difference  in  language  backgrounds, 
socio-economic  status,  intelligence,  and  reading  stimulus  at  home  my  contribute 
to  the  complexity  of  reading  factors  when  one  is  attempting  to  determine  the 
diagnostic  sensitivity  of  reading  tests. 

The  tests  were  administered  in  March  during  the  tw®  weeks  immediately  pre- 
ceeding  the  Easter  recess.  In  order  to  minimize  the  influence  of  fatigue  the 
tests  were  administered  in  short  sittings  in  accordance  with  the  directions  in 
the  manuals.  Through  the  use  of  this  procedure  pupils  who  lost  a  half-day  of 
school  missed  part  of  one  test.  As  a  result,  forty  cases  were  lost,  leaving 
one  hundred  sixty  pupils  who  took  all  parts  of  all  tests.  Of  this  number,  forty- 
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five  were  in  grade  four,  sixty-two  were  in  grade  five,  and  fifty-three  were  in 
grade  six.  The  division  of  sexes  was  almost  even.  The  scores  of  these  one 
hundred  sixty  subjects  only  were  used  as  data  for  the  statistical  analysis. 

CRITERIA  FOR  EVALUATING  THE  DIAGNOSTIC  SENSITIVITY  OF  THE  TESTS 

The  concluding  part  of  the  rational  analysis  will  consist  of  an  applica¬ 
tion  of  a  number  of  criteria  for  evaluating  diagnostic  reading  tests.  The 
criteria  will  be  taken  from  the  related  studies  and  from  the  writings  of  recog¬ 
nized  experts  in  the  field  of  reading.  In  order  to  facilitate  this  analysis 
the  criteria  will  be  grouped  under  the  headings  of  general  criteria,  and 
criteria  of  rate,  vocabulary,  and  comprehension. 

(a )  General .  That  reading  is  a  composite  of  skills  is  generally  accepted, 
but  the  exact  number  and  nature  of  these  components  Is  a  matter  of  controversy. 
According  to  Burkart T  summary  there  are  eighty-nine  separate  abilities  and 
skills  in  the  reading  process.  Obviously  no  one  test  could  adequately  measure 
this  number  of  factors  and  it  becomes  necessary  for  the  maker  of  a  test  to 
limit  the  scope  of  his  tests  to  a  few  selected  areas.  Since  the  cause  of  dis¬ 
ability  in  any  individual  is  often  plural,  the  test  which  gives  the  widest 
possible  scope  and  is  of  the  greatest  depth  would  seem  to  be  of  most  value  as 
a  diagnostic  instrument.  Each  of  the  five  tests  selected  for  this  study  covers 
the  three  basic  areas  of  speed,  comprehension,  and  word  meaning,  but  consider¬ 
able  variation  may  be  observed  among  the  additional  components  of  the  last  three 
tests. 5  Another  criterion  is  proposed  by  Dolch  who  declares  that  "present  tests 

4K.  H.  Burkart,  "An  Analysis  of  Reading  Abilities",  Journal  of  Educational 
Research,  V.  38,  No.  6,  Feb.  1945,  p.p.  430-39. 

5See  Table  I,  page  13  of  this  study. 
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usually  covers  a  wide  range  of  grades,  and  the  result  is  that  the  individual 
teacher  can  secure  little  help  from  the  small  section  of  such  a  test  really 
suited  to  her  own  group  of  children" There  are  specifics  in  reading  ability 
which  cannot  be  tested  adequately  by  survey  tests  that  cover  a  wide  range  of 
grades.  Therefore,  the  use  of  such  tests  for  diagnostic  purposes  is  limited 
by  this  circumstance.  Finally,  the  tests  should  possess  a  high  degree  of  valid¬ 
ity.  That  is,  tests  should  have  the  capacity  to  discriminate  among  pupils  with 
differing  reading  abilities  in  the  various  phases  of  reading  that  they  purport 
to  test. 

From  the  foregoing  discussion  the  following  general  criteria  are  summarized: 

(a)  the  scope  of  the  test  should  be  wide; 

(b)  the  parts  should  be  long  enough  to  test  adequately  the  areas  included; 

(c)  it  should  test  some  specific  thing  which  can  be  taught; 

(d)  the  diagnostic  value  of  a  test  is  limited  by  the  amount  of  material 
it  includes  that  is  suited  to  each  grade  level  within  the  range  of  grades  it 
covers ; 

(e)  it  should  possess  a  high  degree  of  validity. 

(b )  Rate .  Blommers  and  Lindquist6 7  point  out  the  difficulty  of  comparing 
scores  on  tests  where  materials  are  not  equivalent  in  content  and  are  not  read 
for  equivalent  purposes.  In  measuring  rate  of  comprehension  they  regard  the 
purpose  for  reading  and  the  time  required  to  accomplish  this  purpose  as  the 
basic  criteria  for  judging  tests.  If  no  tasks  are  set  for  the  reader,  or  if 

6E.  W.  Dolch,  op.  cit.,  p.  209, 

7P,  Blommers  and  E.  Lindquist,  "Rate  of  Comprehension  of  Reading”,  Journal 
of  Educational  Psychology,  V.  35,  No.  8,  Nov.  1944,  p.p.  449-72. 
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he  is  not  required  to  answer  questions  on  what  has  been  read,  it  is  not  possible 
to  prevent  skipping  or  scanning.  This  contention  is  supported  by  Traxler  who 
states  that: 

The  main  difficulty  is  caused  by  the  fact  that  it  is  necessary 
to  check  on  comprehension  in  some  way,  since  rapid  movement  of  the 
eyes  over  the  material  without  understanding  is  futile  and  cannot 
be  called  reading  in  the  generally  accepted  meaning  of  the  term. 

The  need  to  measure  comprehension  as  well  as  speed  has  led  to  the 
evolution  of  at  least  five  kinds  of  rate  tests,  but  in  none  of  these 
is  the  combination  of  speed  and  questions  entirely  satisfactory.8 

The  two  most  common  procedures  are  to  place  the  questions  or  tasks  at  the 
end  of  short  paragraphs  or  to  insert  them  in  a  running  form.  In  any  case  they 
are  usually  simple  in  order  that  no  time  be  lost  in  thinking  out  answers.  Plac¬ 
ing  questions  at  the  end  of  longer  passages  requires  the  pupil  to  recall  what 
he  has  read  and,  therefore,  tends  to  give  a  measure  of  memory  span  rather  than 
of  reading  rate. 

The  two  types  of  timing  commonly  employed  are  one-minute  intervals  and 
continuous  periods  of  from  five  to  fifteen  minutes.  When  one -minute  intervals 
are  used  it  is  possible  for  the  reader  to  force  himself  to  read  more  rapidly 
than  he  habitually  does,  so  the  purpose  of  the  test  may  be  defeated.  Where 
tasks  are  interwoven  with  the  material  to  be  read  or  are  placed  at  the  end  of 
each  short  paragraph,  the  longer  reading  time  would  seem  preferable  because  it 
more  closely  approximates  the  true  life  situation.  Traxler ^  investigated  the 
relationship  between  the  length  and  reliability  of  a  test  of  rate  of  reading 
and  concluded  that  a  speed  of  reading  test  which  continues  for  four  hundred 


8A.  E.  Traxler,  "Problems  of  Measurement  of  Reading  Ability",  The  School 
Review.  V.  52,  No.  8,  Oct.  1944,  p.  493. 

^A.  E.  Traxler,  "The  Relationship  Between  the  Length  and  Readability  of 
a  Test  of  Rate  of  Reading",  Journal  of  Educational  Research,  V.  32,  No.  1, 
Sept.  1939,  p.  2. 
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seconds  is  much  more  reliable  than  one  for  only  three  hundred  seconds  or  less. 

The  following  criteria  for  tests  of  reading  rate  may  now  be  set  up: 

(a)  the  test  should  be  of  sufficient  length  to  provide  for  a  continuous 
reading  period  of  five  minutes  or  more; 

(b)  simple  questions  or  tasks  should  be  provided  as  a  check  on  the 
pupil’s  reading; 

(c)  the  questions  or  tasks  should  be  close  to  the  source  of  information 
and  not  at  the  end  of  the  selection. 

(c)  Vocabulary.  Tests  of  vocabulary  generally  range  in  difficulty  from 
simple  to  very  difficult  items,  and  are  usually  based  on  a  reliable  document 
such  as  Thorndike’s  Word  List^®.  In  order  to  prevent  other  factors  from  in¬ 
fluencing  the  results  of  the  tests  the  items  should  be  presented  in  isolation 
rather  than  in  context,  "...  because  children  whose  facility  in  the  organiza¬ 
tions  of  language  is  limited,  may  become  confused  in  reading  even  though  they 
possess  adequate  vocabularies’’!!.  That  the  recognition  of  words  in  context 
is  influenced  by  other  factors  is  borne  out  by  Gates  who  declares  that  "... 
difficulty  in  utilizing  the  context  is  often  accompanied  by  inability  to  appre¬ 
ciate  properly  the  grouping  of  thought  units  and  it  is  not  easy  to  tell  which 
of  these  is  the  cause  of  the  other’’!^. 

When  words  are  presented  in  isolation  the  time  should  be  sufficient  for 
all  but  the  slowest  to  finish  in  order  that  the  influence  of  reading  rate  be 
minimized.  The  test,  then,  will  give  a  measure  of  the  range  of  a  pupil’s  sight 
vocabulary. 

!°E.  L.  Thorndike,  The  Teachers  Word  Book. 

!1m.  Monroe,  Children  Who  Cannot  Read,  p.  109. 

!2A.  I.  Gates,  The  Improvement  of  Reading  Instruction,  p.  130. 
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Croribachl1 * 3  concludes  that  a  pupil’s  knowledge  of  the  breadth  of  meaning  of 
a  word  can  be  measured  only  when  it  is  presented  several  times  in  a  test  accord¬ 
ing  to  the  number  of  meanings  attributed  to  it.  He  also  states  that  diagnostic 
tests  should  be  devised  which  will  provide  knowledge  of  a  pupil’s  ability  to 
make  fine  distinctions.  A  test  which  does  not  seek  to  measure  breadth  or  pre¬ 
cision  of  meaning  tends  to  give  an  estimate  of  the  size  of  one’s  vocabulary 
rather  than  to  be  diagnostic  in  character. 

From  the  foregoing  discussion  the  following  criteria  for  tests  of  vocabu¬ 
lary  may  be  summarized: 

(a)  the  items  should  range  in  difficulty  from  simple  to  very  difficult 
for  the  grades  covered; 

(b)  the  items  should  be  presented  in  isolation,  or  in  a  form  which  provides 
no  clue  to  their  meanings; 

(c)  sufficient  time  for  all  to  finish  should  be  provided; 

(d)  breadth  and  precision  of  meaning  should  be  measured. 

(d)  Comprehension.  In  setting  up  criteria  for  evaluating  tests  of  com¬ 
prehension  the  investigator  regards  the  level  of  comprehension  as  the  prime 
factor.  Gates  defines  level  of  comprehension  as  ’’the  highest  degree  of  difficulty 
or  complexity  in  a  passage  which  a  pupil  can  comprehend"^4.  This  implies  that  a 
test  should  consist  of  selections  graded  in  difficulty.  The  second  factor  to  be 
considered  is  whether  the  test  poses  a  variety  of  tasks  covering  several  types 
of  comprehension  or  whether  accuracy  in  a  single  type  of  response  is  measured. 
Adequate  coverage  of  general  comprehension  demands  that  a  wide  range  of  types 

13L.  J.  Croribach,  Diagnostic  Vocabulary  Testing",  Journal  of  Educational 

Research,  V.  36,  No.  3,  Nov.  1942,  p.  206-18. 

~T4a.  i.  Gates,  op.  cit.,  p.  405. 
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of  reading  be  included  in  a  test.  Finally,  the  length  of  the  selection  and 
the  time  allowed  for  completion  of  the  tasks  must  be  considered.  In  this 
connection  Gates  declared  that,  ”...  it  is  advisable  to  eliminate  the  influence 
of  speed  of  reading  as  much  as  possible"15.  Another  argument  for  eliminating 
the  influence  of  speed  comes  from  Carlson1®  who  concludes  that  at  the  middle 
and  lower  levels  of  intelligence  the  slower  reader  tended  to  be  the  better 
reader.  This  tendency  was  accentuated  when  the  purposes  for  reading  were  more 
exacting  and  as  the  difficulty  of  the  material  increased.  A  similar  conclusion 
was  made  by  Blommers  and  Lindquist1?  who  asserted  that  good  comprehenders  ad¬ 
just  their  speed  to  the  material  and  to  the  purpose  for  which  it  is  being  read. 

A  summary  of  the  criteria  of  comprehension  follows : 

(a)  the  test  should  consist  of  selections  of  increasing  difficulty; 

(b)  it  should  test  for  a  wide  range  of  types  of  reading; 

(c)  a  variety  of  questions  should  be  used; 

(d)  the  selection  should  be  long  enough  to  tax  the  abilities  of  the 
better  readers; 

(e)  sufficient  time  for  all  to  finish  should  be  provided. 

STATISTICAL  ANALYSIS 

In  his  discussion  on  the  use  of  the  normal  probability  curve  in  mental 
measurement,  Garrett1®  lists  reading,  as  measured  by  standardized  tests,  among 


15A.  I.  Gates,  op.  cit . ,  p.  406. 

16T.  R.  Carlson,  "The  Relationship  Between  Speed  and  Accuracy  of  Comprehen¬ 
sion",  Journal  of  Educational  Research,  V.  42,  No.  4,  Mar.  1949,  p.p.  500-12. 

17P.  Blommers  and  E.  Lindquist,  "Rate  of  Comprehension  of  Reading",  Journal 
of  Educational  Psychology,  V.  35,  No.  8,  Nov.  1944,  p.  471. 

~  -1®E.  H.  Garrett,  S$jfcig.£ics  in  Psychology^m^Mu^tlaB ,  p.  3.07. 
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the  phenomena  which  tend  to  follow  the  normal  pattern.  Because  calculations 
based  on  raw  scores  rather  than  derived  scores  are  generally  regarded  as  being 
more  suitable  for  determining  the  normalcy  of  a  distribution,  the  actual  scores 
made  by  the  one  hundred  sixty  subjects  on  ea,ch  of  rate,  vocabulary  and  compre¬ 
hension  for  the  five  tests  will  be  used  in  the  preparation  of  frequency  tables 
from  which  measures  of  skewness  and  kurtosis  will  be  obtained.  The  technique 
for  calculating  skewness  and  kurtosis  from  percentiles  will  be  used.  The  sig¬ 
nificance  of  the  obtained  measures  will  then  be  tested  through  the  calculation 
of  "tM  scores  and  the  use  of  corresponding  tables  of  significance.  The  mean, 
standard  deviation,  and  range  of  each  distribution  will  also  be  calculated. 
Separate  tables  for  the  data  relative  to  the  significance  of  the  skewness  and 
kurtosis  for  each  distribution  will  be  presented.  The  statistical  evaluation 
will  then  be  made  on  the  basis  of  the  significance  of  these  measures  for  each 
test. 


ANALYSIS  BASED  ON  GRADE  SCORES 

Each  of  the  five  tests  provides  for  transmutation  of  all  raw  scores  to 
grade  scores  which  are  expressed  in  decimal  fractions  ranging  from  grade  1.0  to 
grade  13.0.  Frequency  tables  of  grade  scores  will  be  constructed  for  rate,  vo¬ 
cabulary,  and  comprehension  for  each  of  the  five  tests,  thus  permitting  compari¬ 
son  of  the  sub-tests.  However,  because  these  scores  are  transmuted,  and  because 
the  investigator  intends  to  use  a  statistical  approach  in  evaluating  the  raw 
scores,  computations  based  on  grade  scores  will  be  limited  to  the  calculation 
of  means  and  standard  deviations.  The  average  of  the  means  for  each  of  the  five 
sub -tests  of  rate,  vocabulary,  and  comprehension  will  then  be  calculated  and 
will  be  used  as  criterion  scores  for  rating  the  tests.  The  ones  which  deviate 
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most  from  these  averages  will  be  suspect,  while  those  which  tend  to  cluster  about 
the  criterion  scores  will  be  regarded  more  favorably.  This  evaluation,  coupled 
with  consideration  of  the  range  of  the  grade  scores,  will  constitute  the  analysis 
to  be  made  from  the  tabular  presentation  of  the  grade  scores. 
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CHAPTER  III 


STATISTICAL  ANALYSIS 

ANALYSIS  OF  DATA  FROM  THE  RAW  SCORES 

The  tests  listed  on  page  eleven  of  this  study  were  administered  during  the 
period  from  the  12th  to  the  22nd  of  March,  1951.  The  raw  scores  made  by  the 
subjects  on  rate,  vocabulary  and  comprehension  for  the  five  tests  were  arranged 
in  seventeen  frequency  tables.  Two  additional  tables  were  needed  for  the  Van 
Wagenen -Dvorak  Test  which  provided  separate  forms  for  grade  six  vocabulary  and 
comprehension.  The  mean,  standard  deviation,  and  range  were  calculated  for  each 
distribution. 

In  order  to  compute  the  estimates  of  skewness  and  kurtosis  the  10th,  25th, 

50th,  75th  and  90th  percentiles  of  each  distribution  were  calculated  by  the 

formula^  P  =  1  +  -)  x  i .  Using  these  percentiles  a  measure  of  the  skew- 

^  ’  P 

ness  of  each  distribution  was  computed  by  applying  the  formula^  Sk  =  P9Q+P10  -pgQ 

■ 

Similarly  the  kurtosis  was  computed  by  applying  the  formula5  Ku  =  Q 

^90 "Flo 

It  should  be  noted  here  that  in  a  normal  curve  the  skewness  is  zero,  and 
according  to  Garrett  the  kurtosis  is  ,2634 ,  Therefore,  any  measure  which  deviates 
widely  from  either  of  these  values  should  be  regarded  with  suspicion,  but  the 
difference  may  not  be  considered  significant  until  proven  so  by  means  of  the  usual 
"t"  test. 

The  standard  error  of  each  measure  of  skewness  for  the  seventeen  sub -tests 


!h.  E.  Garrett,  Statistics  in  Psychology  and  Education,  p.  78, 
2Xbid,  p.  120. 


25. 


in  this  study  was  obtained  by  using  formula 5  dfek  =  » 5185D  .  Each  measure  of 
skewness  was  then  divided  by  its  standard  error  to  give  a  Mt"  score  the  signifi¬ 
cance  of  which  was  determined  by  reference  to  a  table  of  "tHf* 

Similarly  the  "t”  score  foh  each  measure  of  kurtosis  was  obtained  by  divid¬ 
ing  the  measure  by  its  standard  error  which  was  obtained  from  the  formula^ 
dku  =  .27779  #  The  significance  of  these  obtained  measures  then  tested 

vs~ 

through  reference  to  a  table  of  ”t"5 

The  significance  of  the  measures  of  skewness  and  kurtosis  was  determined 
at  the  .05  and  the  .01  levels.  At  the  risk  of  being  in  error  once  in  twenty 
times  at  the  .05  level  and  once  in  one  hundred  times  at  the  .01  level  measures 
of  skewness  and  kurtosis  attaining  or  exceeding  these  limits  must  be  attributed 
to  factors  other  than  accidents  of  sampling. 

The  data  relative  to  the  skewness  and  kurtosis  of  each  of  the  seventeen 
distributions  have  been  compiled  in  tabular  form  and  are  here  presented  in 
Tables  III* 6 * 8  and  IY9. 


5lbid.,  p.  220. 

6Ibid.,  p.  191. 

7 lb id. ,  p.  222. 

8See  page  26  of  this  study. 

9See  page  27  of  this  study. 


f 


26 


TABLE  III.  TEST  OF  SIGNIFICANCE  OF  SKEWNESS  OF  DISTRIBUTIONS  OF  RAW 
SCORES  MADE  BY  ONE  HUNDRED  SIXTY  SUBJECTS  ON  EACH  OF  THE 
BASIC  COMPONENTS  OF  THE  FIVE  READING  TESTS. 


1 - j 

- f 

I - — — - | 

Tests 

3 

Mean  j 

Sigma 

Range  \ 

Skewness 

Level  of 
j  Significance! 

■ 

i  Pressey 

Rate 

1  1 

33.18  ! 

I 

I 

1 

8.43 

j 

10-66  | 

4.7 

! 

i 

;  Sig.  at  .01  j 

Vocab . 

46.3 

13.75  | 

13-83  | 

1.32 

:  "  ! 

1 

Comp. 

27.94 

12.52  J 

3-55  j 

-2.13 

1  1 

!  ------ 

1 

I 

-■ 

Gates 

Rate 

.1 

33.86 

10.88  j 

1 

:  I 

1  10-63 

2.28 

r 

Sig.  at  .05 

?  i 

Vocab . 

39.85 

i 

10.92  | 

|  10-66 

I  I 

.07 

v  : 

Comp. 

49 . 5 

!  12.12  | 

j 

|  j 

15-74 

| 

-1.87 

8 

. j 

I 

Sangren-Woody 

Rate 

i 

20.84 

I 

| 

7.32 

. 

5-41  j 

-.41 

! . i 

| 

I 

Vocab . 

J 

19.66  | 

5.79 

i  1 

2-38  j 

.32 

I  _ j 

i  i 

Comp. 

! 

29.02 

)  11.4 

3-59  j 

2.1 

L  J 

Iowa 

Rate 

;  27.38 

j  10.32 

: 

1  1 

6-59 

j 

1.24  ; 

. 

Vocab . 

20.27 

j  7.8 

5-44 

2.18 

!  1 

I  1 

Sig.  at  .05 

Comp. 

'  48.35 

j  17.4 

1  . . 

13-88 

6.7 

I 

Sig.  at  .01 

! , 

1 

1 

1  .  1 

!  f 

Van  Wagenen -Dvorak 

Rate 

| 18.74 

1 

j  6.00 

2-41 

( 

1.00 

1  1 
§ 

Intermediate  grade  4  &  5 
Vocab . 

| 17.03 

| 

8.07 

2-38 

2.6 

Sig.  at  .05 

Comp. 

j 49.06 

18,95 

8.83 

-2.35 

Junior  grade  6 

Vocab . 

15.96 

5.00 

7-29 

1.08 

Comp. 

46.4 

11.16 

22-70 

1.52 

- 

. 
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TABLE  IV.  TEST  OF  SIGNIFICANCE  OF  KURTOSIS  OF  DISTRIBUTIONS  OF  RAW 
SCORES  MADE  BY  ONE  HUNDRED  SIXTY  SUBJECTS  ON  EACH  OF  THE 
BASIC  COMPONENTS  OF  THE  FIVE  READING  TESTS. 


Tests 

Mean 

Sigma 

Range 

Kurtosis 

Level  of 
Significance 

Pressey 

Rate 

33.18 

8.43 

10-66 

.27 

------  : 

Voeab . 

46.3 

1  13.75 

13-83 

.35 

]  Sig.  at  .01  j 

Corap. 

27.94 

IS.  52 

3-55 

.27 

I 

Gates 

: 

I  i 

Rate 

33.86 

10.88 

1 

j  10--63 

.27 

|  -  ; 

<  ------ 

?  i 

] 

Voeab . 

39.85 

10.92  j 

j  10—66  j 

.28 

* . .  I 

j 

Comp. 

. . .... 

49.5 

j  12.12  j 

15*?-74  : 

.23 

| 

I------.  > 

I 

..I 

i 

j  Sangren-Woody 

1 

1 

| 

1 

[ 

i 

Rate 

20.84  | 

7.32 

5—41 

.27 

! -  { 

| 

Voeab . 

J  j 

!  1 

119.66  ; 

!  j 

| 

|  5.79  | 

2—38  j 

.27  1 

| 

j  j 

I 

Comp. 

29 .02 

^  11.4  j 

:  3--59  j 

.29 

\ _ _  _  1 

1 

L  S 

!  s 

.  -  1 

'  - - ] 
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|  1 

1 

| 

!  1 

I 

'  1 
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^27.38  j 

1 

10.32 

6  —  59  j 

.25  j 

i  s 

;  I 

Voeab . 

: 

'20.27  j 

| 

I 

7.8  1 

f  i 

5-44  J 

.27 

i 

i 

| 

1 
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48.35  - 

'  j 

17.4  j 

. 

13—88  j 

.28  j 

“  TO  ”  “  30 

■ 

| 

f 

Van  Wagenen -Dvo rak 

I 

• 

Rate 

18.74  | 

i 

6.00  j 

J 

2—41  j 

.26  J 

i 

Intermediate  grades  4  &  5 

1 

| 

i 

Voeab . 

17.03  | 

8.07  : 

2—38 

.29  § 

------ 

Comp. 

49.06 

18.95  | 

8  —  83 

.31  j 

Sig.  at  ,05  | 

Junior  grade  6 

| 

1 

i 

Voeab . 

15.96  | 

5.00  j 

7—29 

.36 

Sig.  at  .01  J 

Comp. 

46.4  l 

11.16  | 

22—70 

.22  | 

1 

. 

I 
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In  Table  III^O  the  measures  of  skewness  that  have  a  minus  sign  indi¬ 


cate  distributions  which  are  skewed  to  the  left.  The  distributions  which 


yielded  a  positive  measure  of  skewness  are  skewed  to  the  right  as  indicated  in 


Figures  1  and  2.  Five  of  the  sub-tests  yielded  significant  degrees  of  positive 


skewness,  but  only  the  Pressey  Rate  sub -test  and  the  Iowa  Comprehension  sub¬ 


test  were  found  to  be  positively  skewed  at  the  .01  level.  The  distributions 


for  these  two  sub -tests  are  presented  graphically  in  Figures  1  and  2. 
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Fig.  1.  Distribution  of  scores  for  the 
Pressey  Rate  sub-test. 
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Fig.  2.  Distribution  of  scores  for  the 
Iowa  Comprehension  sub -test. 
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'See  page  26  of  this  study. 
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Skewness  indicates  that  for  some  reason  the  children  taking  a  test  score 
too  high  or  too  low.  Kurtosis  suggests  that  the  scores  do  not  cluster  normally 
about  the  mean.  If  too  many  of  the  children  score  at  the  centre  of  the  distribu¬ 
tion  the  curve  will  be  peaked  or  leptokurtic.  If  too  few  score  near  the  centre 
it  will  be  flattened  or  platykurtic .  In  any  case,  extremes  of  either  skewness 
or  kurtosis  denote  a  departure  of  the  local  scores  from  the  norms.  This  suggests 
that  there  has  been  an  error  in  normalization  or  that  one’s  sample  is  poor. 
However,  when  the  majority  of  the  sub -tests  in  a  group  such  as  are  Included  in 
this  study  yield  normal  distributions  the  sample  may  be  regarded  as  fairly  repre¬ 
sentative.  Consequently,  to  the  degree  that  significant  deviations  occur,  the 
tests  tend  to  lose  one’s  confidence. 

Table  IVH  reveals  that  only  four  of  the  sub -tests  produced  leptokurtic 
distributions,  but  none  of  these  was  sufficiently  peaked  to  be  considered  as 
significant.  Of  the  thirteen  sub-tests  revealing  platykurtic  tendencies  only 
three  were  found  to  have  yielded  distributions  significantly  platykurtic  at  the 
.05  level.  Two  of  these,  the  Press ey  Vocabulary  and  the  Van  Wagenen -Dvorak 
Junior  Vocabulary  sub-tests  yielded  distributions  sufficiently  platykurtic  to  be 
regarded  as  significant  at  the  .01  level.  None  of  the  Gates,  Sangren-Woody  or 
Iowa  sub-tests  exhibited  a  significant  degree  of  kurtosis.  The  distributions 
for  the  Van  Wagenen -Dvorak  Intermediate  Comprehension  and  Junior  Vocabulary  sub- 
tests  were  found  to  be  significantly  platykurtic  at  the  .05  and  the  .01  levels 
respectively. 

The  Pressey  Vocabulary  sub -test  was  the  only  test  taken  by  the  entire  group 
of  subjects  to  yield  a  significantly  platykurtic  distribution  at  the  .01  level. 
The  distribution  for  this  sub-test  is  presented  graphically  in  Figure  3^. 


13-See  page  27  of  this  study. 
12see  page  30  of  this  study. 
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Fig.  3.  Distribution  of  raw  scores  for  the 
Pressey  Vocabulary  sub-test. 
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Fig.  4.  Distribution  of  raw  scores  for  the 
Van  Wagenen -Dvorak  Junior  Vocabu¬ 
lary  sub -test. 


The  Van  Wagenen-Dvorak  Junior  Vocabulary  sub -test  which  was  taken  by  the 
fifty-two  grade  six  subjects  was  the  only  other  sub-test  significantly  platy- 
kurtic  at  the  .01  level.  Despite  the  fact  that  a  much  smaller  number  of  scores 
is  included  a  graph  of  this  distribution  is  also  presented  here. 
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The  following  is  a  summarization  of  the  pertinent  facts  revealed  by  Tables 
III  and  IV: 

1.  None  of  the  seventeen  distributions  showed  a  significant  degree  of 
negative  skewness. 

2.  The  five  sub-tests  that  yielded  distributions  which  showed  a  signifi¬ 
cant  degree  of  positive  skewness  were  the  Pressey  Rate,  the  Gates  Rate,  the  Iowa 
Vocabulary,  the  Iowa  Comprehension,  and  the  Van  Wagenen-Dvorak  Intermediate 
Vocabulary  sub -test. 

3.  None  of  the  distributions  proved  to  be  significantly  leptokurtic. 

4.  The  three  sub-tests  which  yielded  significantly  platykurtic  distribu¬ 
tions  were  the  Pressey  Vocabulary,  the  Van  Wagenen-Dvorak  Intermediate  Compre¬ 
hension,  and  the  Van  Wagenen-Dvorak  Junior  Vocabulary  sub -test. 

5.  All  the  tests  except  the  Sangren-Woody  Test  yielded  at  least  one 
significantly  deviant  distribution. 

6.  None  of  the  sub -tests  yielded  a  distribution  that  showed  a  significant 
degree  of  both  skewness  and  kurtosis. 

The  tests  may  now  be  ranked  according  to  their  ability  to  yield  normal 
distributions  of  raw  scores : 

The  Sangren-Woody  Test  ranks  first  because  It  yielded  no  significantly  devi¬ 
ant  distributions.  However,  when  the  raw  scores  for  this  test  were  transmuted 
to  grade  scores  a  basic  weakness  was  discovered  which  will  be  discussed  in  a 

later  chapter. 

The  Gates  Test  which  ranks  second  produced  two  normal  distributions  and  one 
that  showed  a  significant  degree  of  positive  skewness  at  the  .05  level. 

Three  of  the  five  Van  Wagenen-Dvorak  sub-tests  yielded  distributions  which 
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deviated  significantly  from  what  might  normally  be  expected.  Two  of  these,  the 
Intermediate  Vocabulary  and  Comprehension  sub-tests,  showed  significant  degrees 
of  skewness  and  platykurtosis  respectively  at  the  .05  level.  The  Junior  Vocabu¬ 
lary  Test  yielded  a  distribution  which  showed  a  significant  degree  of  platykur¬ 
tosis  at  the  .01  level.  The  Van  Wagenen-Dvorak  Tests  rank  third  in  the  normalcy 
of  the  distribution  of  raw  scores  since  two  of  the  five  distributions  yielded  no 
significantly  deviant  distributions  and  only  one  was  found  to  be  deviant  at  the 
.01  level. 

The  Iowa  Test  ranks  fourth  in  this  evaluation.  It  yielded  distributions 
for  vocabulary  and  comprehension  that  showed  significant  degrees  of  positive 
skewness  at  the  .05  and  ,01  levels  respectively.  Of  the  three  distributions 
obtained  from  its  sub-tests  only  the  one  for  rate  was  normal. 

Because  two  of  the  three  distributions  yielded  by  the  Pressey  Test  were  sig¬ 
nificantly  deviant  at  the  .01  level,  it  ranks  last.  The  distribution  for  rate 
proved  to  be  positively  skewed,  while  that  for  the  vocabulary  yielded  a  signifi¬ 
cant  degree  of  platykurtosis. 

These  conclusions  assume  large  proportions  in  a  study  of  this  type  which 
seeks  to  explore  the  diagnostic  values  of  a  group  of  silent  reading  tests,  because 
one  vital  quality  of  a  diagnostic  test  is  its  power  to  report  accurately  the 
reading  level  of  each  testee  in  the  various  types  of  reading  activity  for  which 
it  provides  scores.  It  seems  reasonable  to  assume  that  the  grade  scores  would 
tend  to  he  less  accurate  from  tests  which  yielded  significantly  deviant  distribu¬ 
tions  of  raw  scores  than  from  those  that  yielded  normal  distributions.  Therefore, 
the  sub-tests  in  this  study  which  yielded  significantly  deviant  distributions 
should  be  regarded  with  suspicion,  and  too  much  reliance  should  not  be  placed  on 
the  results  they  yield. 
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During  the  rational  analysis  in  Chapter  V  the  criteria  listed  in  Chapter  II 
of  this  study  will  he  applied  to  the  tests  in  an  effort  to  search  out  any  struc¬ 
tural  weaknesses  which  might  have  contributed  to  the  production  of  deviant  dis¬ 
tributions.  An  estimate  of  the  diagnostic  values  of  the  tests  will  then  be  made. 

SUMMARY  OF  THE  FINDINGS 

The  following  is  a  summary  of  the  findings  of  this  chapter: 

1.  Too  much  reliance  should  not  be  placed  on  the  individual  scores  yielded 
by  the  sub-tests  which  yielded  significantly  deviant  distributions. 

2.  When  the  tests  were  ranked  according  to  their  ability  to  yield  normal 
distributions  the  order  of  preference  was* 

(a)  The  Sangren -Woody  Reading  Test, 

(b)  The  Gates  Survey  Test, 

(c)  The  Van  Wagenen-Dvorak  Tests, 

(d)  The  Iowa  Test, 

(e)  The  Pressey  Test. 


CHAPTER  IV 


RATIONAL  ANALYSIS 

ANALYSIS  OF  DATA  FROM  GRADE  SCORES 

Each  of  the  five  tests  included  in  this  study  makes  provision  for  the 
transmutation  of  raw  scores  to  grade  scores.  Therefore,  the  grade  scores  made 
by  each  of  the  one  hundred  sixty  subjects  in  rate,  vocabulary  and  comprehension 
were  recorded.  In  preparing  the  frequency  distributions  for  the  Van  Wagenen- 
Dvorak  Intermediate  and  Junior  Tests  the  grade  scores  for  the  two  vocabulary  and 
for  the  two  comprehension  tests  were  combined.  As  a  result  there  were  fifteen 
distributions  of  grade  scores. 

As  in  the  case  of  the  raw  scores  the  mean,  standard  deviation,  and  range  of 
each  distribution  was  computed,  but  no  tests  of  normalcy  were  made.  Instead, 
the  averages  of  the  mean  grade  scores  for  rate,  vocabulary  and  comprehension 
were  taken  as  criterion  scores.  In  addition  the  mean  grade  score  yielded  by 
each  test  was  recorded  in  order  that  it  might  be  compared  with  the  general  con¬ 
census.  This  was  done  on  the  assumption  that  when  a  variety  of  tests,  designed 
to  measure  the  same  entity,  and  expressing  the  results  in  the  same  terms,  are 
averaged  as  to  results,  marked  deviations  from  the  concensus  or  average  may  be 
viewed  with  suspicion  as  departing  from  the  general  tendency  of  the  group.  In 
Table  V^  the  criterion  scores  for  rate,  vocabulary  and  comprehension  are  reported 
to  be  6.26,  6.03  and  5.97  respectively,  and  the  general  concensus  or  average  is 
6.09.  Similarly,  averages  were  found  for  the  columns  of  standard  deviations  and 
ranges . 


Isee  page  35  of  this  study. 


TABLE  V.  COMPARISON  OF  MEAN  GRADE  SCORE,  STANDARD  DEVIATION,  AND 
RANGE  FOR  EACH  OF  THREE  PARTS  OF  THE  FIVE  TESTS. 


Tests 

f  ... 

Mean 

j  Sigma 

Range 

-  Pressey 

2.04 

Rate 

5.56 

2—11  j 

Vocabulary 

6.00 

1.86 

2—10 

Comprehension 

6.40 

\  1 . 98 

1 

i 

2—10  j 

| 

Average 

I 

5.99 

I 

1  1 

Gates 

I 

1 

Rate 

6.07 

f  2.20 

2—12  1 

Vocabulary 

6.51 

S  1.40 

3—9 

Comprehension 

5.88 

{ 

\  1.45 

3—9 

Average 

6.15 

£ 

1  . . 

I 

j 

Sangren-Woody 

I 

|  | 

! 

Rate 

7.02 

S  2.86 

2-13  j 

Vocabulary 

6.71 

I  1.78 

2—11 

Comprehens ion 

6.41 

|  1.72 

2—11  | 

I  | 

Average 

6.71 

i 

5  I 

I  ova 

I  | 

!  j 

Rate 

6.76 

3.56 

1—13 

Vocabulary 

5.44 

1.70 

1— 12  | 

Comprehension 

5.88 

1.89 

2—10  j 

Average 

6.03 

i  | 

] 

Van  Wagenen-Dvorak 

1  i 

I  I 

Rate 

5.88 

(  2.06 

2—12  | 

Vocabulary 

5.17 

f  1.76 

2—10  j 

Comprehension 

4.90 

\  1.12 

i  i 

2—9 

Average 

5.28 

:  \ 

J 

Averages  -  Criterion  Scores 

j 

Rate 

6.26 

2.54  j 

1.8-12.1  | 

Vocabulary 

6.03 

1.69  | 

2.1-10.1 

Comprehension 

5.97 

j  1.64  1 

I  s 

1  J 

2.3-9. 9 

Average 

6.09 

1  1.96  | 

J_ L 

2.1-10.7 

..  L 
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In  Table  VI  the  actual  number  of  times  that  the  grade  score  made  by  each 
of  the  subjects  on  the  three  basic  sub -tests  appeared  as  the  extreme  high,  the 
middle  or  the  extreme  low  grade  score  is  presented  as  a  percentage  of  the  one 
hundred  sixty  cases. 


TABLE  VI.  PERCENTAGE  OF  TIMES  THE  SCORE  MADE  BY  EACH  SUBJECT  ON  EACH 
OF  RATE,  VOCABULARY,  AID  COMPREHENSION  WAS  THE  EXTREME 
HIGH,  THE  EXTREME  LOW,  OR  THE  MIDDLE  GRADE  SCORE. 


; 

TEST 

t 

\  -HIGH 

1 

1  LOW 

i 

MIDDLE 

Rate 

Pressey 

10 

13 

35 

Gates 

15 

16 

28  j 

Sangren -Woody 

34 

16 

14 

Iowa 

30 

33 

8 

Van  Wagenen-Dvorak  I .  &  J. 

13 

22 

25 

Vocabulary 

Pressey 

9 

9 

i  f 

1 

30 

Gates 

33 

3 

21 

Sangren -Woody 

50 

8 

13  1 

Iowa 

4 

28 

23 

I 

. 

Van  Wagenen-Dvorak  I.  &  J, 

5 

52 

15 

j 

Comprehension 

Pressey 

1 

40 

14 

13 

Gates 

12 

8 

31 

Sangren -Woody 

36 

4 

20 

Iowa 

; 

12 

18 

29 

Van  Wagenen-Dvorak  I.  &  J. 

3 

57 

J 

ii 

J 
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Examination  of  Table  V^  reveals  that  the  mean  grade  score  of  the  three 
basic  parts  of  all  the  tests  was  6.09  while  the  actual  grade  level  of  all 
subjects  was  only  5.70.  Of  the  five  tests,  the  Sangren -Woody  gave  the  highest 
average  grade  score  and  the  Van  Wagenen -Dvorak  gave  the  lowest.  Similarly 
the  first  column  of  Table  VI2 3  reveals  that  it  was  the  Sangren-Woody  sub -tests 
which  most  consistently  yielded  the  extreme  high  grade  scores.  From  the 
second  column  it  will  be  noted  that  the  Van  Wagenen -Dvorak  sub -tests  most 
frequently  yielded  the  lowest  grade  scores.  The  last  column  reveals  that  it 
was  the  Gates  sub -tests  which  most  consistently  yielded  the  middle  grade 
scores.  The  average  grade  scores  of  the  Pressey,  Gates,  and  Iowa  Tests  all 
clustered  closely  about  the  central  tendency.  However,  a  separate  considera¬ 
tion  of  the  data  for  each  of  rate,  vocabulary,  and  comprehension  reveals  some 
additional  important  deviations. 

All  the  tests  except  the  Pressey  Diagnostic  yielded  higher  mean  grade 
scores  for  rate  than  for  vocabulary  or  comprehension.  This  order  was  reversed 
by  the  Pressey  Test  which  yielded  its  highest  mean  grade  score  for  comprehen¬ 
sion  and  the  lowest  for  rate.  A  careful  examination  of  the  completed  test 
booklets  revealed  that  the  subjects  made  very  few  errors  in  marking  the  answers 
to  the  questions  which  are  interwoven  with  the  sentences  in  the  sub -test  for 
rate.  It  would  appear  that  the  subjects  tended  to  slow  down  in  order  to  get 
the  answers  right.  This  does  not  detract  from  the  diagnostic  value  of  the 
Pressey  Rate  sub -test,  for  it  provides  very  definite  clues  as  to  the  reading 
habits  and  skills  of  the  testees.  However,  the  fact  that  the  Pressey  Rate  sub- 


2See  page  35  of  this  study. 

^See  page  36  of  this  study. 
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test  yielded  a  significantly  abnormal  distribution  of  raw  scores,  as  illustrated 
in  Figure  1,  does  detract  from  its  value  as  a  diagnostic  instrument. 

The  mean  grade  score  for  the  vocabulary  part  of  the  Pressey  Test,  as  re¬ 
ported  in  Table  III,  is  only  .03  of  a  grade  above  the  general  concensus; 
therefore,  it  would  appear  that  the  results  yielded  by  this  sub-test  might  be 
regarded  as  being  fairly  accurate.  However,  it  will  be  remembered  that  the 
vocabulary  sub-test  yielded  one  of  the  two  distributions  which  were  sufficiently 
platykurtic  at  the  .01  level.  Finally,  the  Pressey  Comprehension  sub-test  which 
calls  for  only  one  type  of  response,  yielded  the  highest  mean  grade  score  of  any 
of  the  comprehension  sub-tests.  This  combination  of  adverse  factors  tends  to 
lessen  the  value  of  the  pressey  Test  as  an  instrument  of  diagnosis. 

The  range  of  mean  grade  scores  yielded  by  the  fifteen  sub-tests  was  from  a 
low  of  4.9  for  the  Van  Wagenen -Dvorak  Comprehension  sub-test  to  a  high  of  7.02 
for  the  Sangren-Woody  Rate  sub-test.  The  Sangren-Woody  Test  is  a  timed  test, 
that  is,  specific  time  limits  ranging  in  length  from  one  to  eight  minutes  are 
imposed  for  each  part  of  the  test  with  a  total  of  twenty-seven  minutes  being 
allowed  for  the  completion  of  all  the  parts.  The  main  criticism  of  the  use  of 
the  timed  tests  for  diagnostic  purposes  stems  from  the  fact  that  the  imposition 
of  time  limits  does  not  permit  each  testee  to  attempt  all  the  items  in  each 
part  of  the  test.  Indeed,  in  this  study  many  pupils  were  found  to  have  com¬ 
pleted  a  large  portion  of  the  items  included  in  many  of  the  sub-tests  without 
making  any  errors.  It  would  seem,  therefore,  that  timed  tests  of  vocabulary 
or  comprehension  should  not  be  used  for  diagnostic  purposes  with  subjects  known 
to  be  handicapped  by  slow  reading  speed.  Another  criticism  stems  from  the  fact 
that  the  Sangren-Woody  Reading  Test  yielded  the  highest  mean  grade  score  of  any 
of  the  tests  and  rated  the  subjects  on  the  average  .7  of  a  grade  above  the 
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general  concensus.  Therefore,  despite  the  fact  that  this  test  yielded  no  mark¬ 
edly  deviant  distributions  of  raw  scores,  its  use  as  a  diagnostic  instrument 
seems  limited. 

The  Iowa  Test  is  also  a  timed  test.  Therefore,  its  diagnostic  value  is 
conditioned  by  the  limitations  already  mentioned  for  tests  which  have  specific 
time  limits.  The  Iowa  Rate  sub-test  yielded  a  very  high  mean  grade  score. 

Like  the  Sangren-Woody  sub -test  for  rate  it  employs  one -minute  intervals  as  a 
timing  device  and  provides  no  checks  on  comprehension.  In  addition,  it  yielded 
the  widest  range  of  grade  scores  of  any  of  the  fifteen  sub-tests.  The  Compre¬ 
hension  sub-test,  on  the  other  hand,  yielded  a  mean  grade  score  which  closely 
approximates  the  criterion  score,  but  the  distribution  of  raw  scores  for  this 
sub-test  was  found  to  show  a  significant  degree  of  positive  skewness  at  the  .01 
level.  Therefore,  because  this  sub-test  tends  to  yield  scores  that  bunch 
closely  below  the  mean  and  spread  out  too  much  above  it,  as  illustrated  In 
Figure  24,  the  results  it  yields  must  not  be  regarded  as  being  definitive. 

These  considerations  place  limitations  on  the  use  of  the  Iowa  Tests  for  both 
grading  and  diagnostic  purposes. 

The  Van  Wagenen -Dvorak  Tests,  which  impose  no  time  limits  except  for  rate, 
yielded  the  lowest  mean  grade  score  of  any  of  the  five  tests.  That  part  of  the 
test  devoted  to  comprehension  contains  ten  pages  of  material  consisting  of 
twenty -three  long  paragraphs  and  one  hundred  questions,  and  requires  about  an 
hour's  working  time  to  complete.  Moreover,  it  was  designed  to  be  machine  scored 
so  all  answers  are  recorded  on  an  answer  blank  which  is  separate  from  the  test 
booklet.  Such  a  test  must  certainly  appear  a  little  overwhelming  to  the  slow 
reader  or  to  the  immature  child.  However,  it  does  permit  each  testee  to  work 


4See  page  28  of  this  study. 
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at  his  habitual  rate  and  at  least  attempt  all  parts.  In  addition  it  provides 
twenty  tasks  for  each  of  five  types  of  comprehension  and  yields  separate  scores 
for  each  of  these  parts  as  well  as  a  total  score.  Therefore,  despite  the  facts 
that  three  of  the  distributions  proved  to  be  significantly  deviant  from  the 
normal  tendency  and  that  the  tests  yielded  comparatively  low  grade  scores, 
the  investigator  regards  them  as  being  more  useful  as  diagnostic  instruments 
than  the  shorter  timed  tests. 

The  Gates  Survey  Test  yielded  mean  grade  scores  which  most  nearly  coincided 
with  the  general  concensus  and  as  a  result  appears  to  be  the  most  satisfactory 
instrument.  However,  the  diagnostic  value  of  the  comprehension  sub-test  is 
limited  because  its  scope  is  narrow.  Although  it  employs  a  device  that  requires 
the  reader  to  think  about  the  sentences  he  has  read,  and  provides  material  of 
increasing  difficulty,  it  measures  the  subject’s  ability  in  only  one  important 
aspect  of  comprehension.  Nevertheless,  the  Gates  Survey  Test  appears  to  ha  an 
excellent  one  for  the  purpose  of  determining  in  which  of  the  three  areas  of 
rate,  vocabulary,  or  comprehension  a  pupil’s  strength  and  weaknesses  lie.  Once 
this  fact  has  been  determined,  other  more  diagnostic  tests  might  be  used  to 
identify  the  particular  abilities  or  skills  within  each  area  in  which  the  pupil 
is  retarded. 

Consideration  of  the  standard  deviations  of  the  distributions  of  grade 
scores  reveals  that  those  for  rate  are  much  larger  than  the  ones  for  vocabulary 
and  comprehension.  The  distributions  of  grade  scores  for  the  five  rate  tests, 
therefore,  may  not  be  accepted  as  entirely  normal  because  they  tend  to  be  more 
dispersed  than  the  others.  The  Iowa  Rate  sub -test  with  a  standard  deviation  of 
3.56  and  the  Sangren -Woody  sub -test  with  2.86  deviated  most  from  the  general 
concensus  of  1.96,  Another  important  tendency  revealed  by  the  data  on  grade 
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scores  is  that  the  standard  deviations  for  vocabulary  and  comprehension  cluster 
more  closely  about  their  central  tendencies  than  do  those  for  rate. 

Significant  patterns  may  also  be  observed  among  the  measures  included  in 
Table  V.  Although  the  tests  all  yielded  wide  ranges  of  grade  scores  for  rate, 
only  three,  Pressey,  Sangren -Woody,  and  Iowa  produced  correspondingly  wide 
ranges  for  vocabulary  and  comprehension.  While  these  three  yielded  uniformly 
wide  ranges  in  grade  scores,  the  Gates  and  Van  Wagenen-Dvorak  tests  tended  to 
exhibit  a  larger  degree  of  variation  among  the  ranges  for  their  sub-tests.  The 
narrowest  range  of  grade  scores,  3  to  9,  was  yielded  by  both  the  vocabulary  and 
comprehension  parts  of  the  Gates  Test. 

APPLICATION  OF  CRITERIA 

Before  a  concluding  statement  of  the  findings  of  this  study  can  be  made, 
consideration  must  be  given  to  an  application  of  the  criteria  listed  in  Chapter 
II  in  an  effort  to  discover  any  further  structural  strengths  or  weaknesses 
which  might  influence  an  estimate  of  the  diagnostic  value  of  each  of  the  five 
tests . 

Throughout  this  thesis  reference  has  been  made  to  the  influence  of  the 
scope  of  a  test  on  its  diagnostic  efficiency.  A  diagnostic  test  of  reading 
ability  is  generally  regarded  as  one  which  not  only  provides  information  relat¬ 
ing  to  how  a  student's  achievement  in  such  a  basic  ability  as  comprehension 
compares  with  his  achievement  in  rate  or  vocabulary,  but  also  provides  scores 
which  enable  a  teacher  to  determine  how  each  testee  stands  in  a  number  of  sub- 
skills  in  relation  to  his  average  score  for  comprehension,  rate,  or  vocabulary. 
This  lack  of  scope  in  the  comprehension  sub -test  is  an  important  weakness  of 
the  Pressey  and  Gates  Tests  so  far  as  their  respective  diagnostic  capacities 
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are  concerned.  The  Sangren -Woody,  Iowa,  and  Van  Wagenen-Dvorak  Tests  each  pro¬ 
vide  a  number  of  sub -tests  under  comprehension  as  indicated  by  the  bracketed 
parts  in  Table  I5.  However,  the  sub-tests  in  both  the  Iowa  and  Sangren-Woody 
Tests  appear  too  short  and,  because  specific  time  limits  are  imposed,  too 
limited  to  test  adequately  the  areas  Included. 

The  Iowa  and  Van  Wagenen-Dvorak  Tests  each  contain  two  sub-tests  in  addi¬ 
tion  to  the  ones  included  under  Comprehension.  The  Iowa  Test  provides  for  the 
measurement  of  a  pupil’s  skill  in  alphabetizing  and  in  using  an  index.  From 
the  results  of  the  testing  program  it  seems  apparent  that  these  skills  can  be 
taught,  for,  in  some  cases,  whole  classes  scored  much  lower  on  these  items 
than  on  the  other  parts  of  the  test,  and  in  one  case  nearly  all  the  members  of 
a  class  scored  higher. 

Of  the  two  additional  sub-tests  contained  in  the  Van  Wagenen-Dvorak  Tests 
the  one  entitled,  "Perception  of  Relations”  appears  to  be  a  good  test  of  a 
pupil’s  ability  to  think  while  he  is  reading  and  to  perceive  relationships. 
Although  the  sub-test  designated,  "Range  of  General  Information”  has  applica¬ 
tion  for  diagnostic  purposes  it  occasionally  calls  for  knowledge  of  facts  quite 
foreign  to  Canadian  children. 

Only  three  of  the  five  sub-tests,  those  in  the  Pressey,  Gates,  and  Van 
Wagenen-Dvorak  Tests,  provide  for  a  reading  time  of  five  minutes  or  more,  but 
one  of  these,  the  Pressey  Rate  sub-test,  yielded  a  significantly  skewed  distrib¬ 
ution  at  the  .01  level  and  a  low  mean  grade  score.  The  Iowa  and  Sangren-Woody 
Rate  sub -tests  which  yielded  very  high  mean  grade  scores  and  correspondingly 
large  standard  deviations,  employ  one-minute  intervals  as  timing  devices  and 
provide  no  checks  on  comprehension.  Therefore,  it  would  appear  that  the  Gates 


5see  page  13  of  this  study. 
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and  Van  Wagenen -Dvorak  Rate  sub-tests  were  the  most  satisfactory  for  the  purpose 
of  this  study.  However,  it  should  be  noted  here  that  the  diagnostic  value  of 
any  one  of  these  five  rate  tests  is  limited  because  an  examiner  can  evaluate 
a  subject's  achievement  in  rate  only  in  terms  of  his  achievement  in  the  other 
parts  of  the  test.  Cronbach  concluded  that  a  test  of  vocabulary  should  seek  to 
measure  the  depth  and  breadth  of  meaning  which  a  pupil  attaches  to  a  word  as 
well  as  the  range  of  his  vocabulary6.  None  of  the  five  tests  provides  for  such 
an  evaluation.  Each  contains  a  sub -test  for  vocabulary  with  the  words  presented 
in  a  manner  which  provides  no  clues  to  their  meaning  and  in  the  order  of  in¬ 
creasing  difficulty.  Only  one  test,  that  by  Van  Wagenen  and  Dvorak,  contains 
an  additional  sub-test  in  which  words  are  presented  in  context,  thus  enabling 
the  examiner  to  compare  a  subject's  achievement  in  at  least  two  aspects  of 
vocabulary  development  * 

From  the  foregoing  discussion  it  appears  that  the  Van  Wagenen-Dvorak  Tests 
are  of  the  most  value  as  diagnostic  instruments.  However,  the  investigator 
found  that  the  use  of  separate  answer  blanks  lessened  the  value  of  these  tests 
for  the  purpose  of  individual  diagnosis.  In  cases  where  retardation  is  evident 
the  teacher  often  wishes  to  study  the  actual  responses  made  by  the  subject  but 
when  answer  blanks  are  employed  this  cannot  readily  be  done.  Moreover,  the 
answers  to  the  one  hundred  tasks  which  constitute  the  five  sub-tests  of  the 
Van  Wagenen-Dvorak  Comprehension  Test  are  so  intermingled  as  to  make  individual 
examination  of  a  pupil's  responses  in  one  part  of  the  test  extremely  difficult. 
This  criticism  does  not  apply  to  the  other  four  tests  all  of  which  provide  for 
the  answers  to  be  marked  directly  in  the  test  booklets. 


6See  page  20  of  this  study. 
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SUMMARY  OF  THE  FINDINGS 

1.  The  Sangren-Woody  Reading  Test  yielded  the  mean  grade  score  which 
deviated  most  from  the  central  tendency. 

2.  The  sub -tests  for  rate  in  both  the  Sangren-Woody  and  Iowa  Tests 
yielded  the  mean  grade  scores  which  deviated  most  from  the  average. 

3.  Timed  tests  of  vocabulary  and  comprehension  such  as  the  Iowa  and 
Sangren-Woody  sub -tests  should  not  be  used  for  diagnostic  purposes  with  pupils 
known  to  have  slow  reading  speed. 

4.  The  analysis  of  the  results  of  the  Pressey  Test  revealed  among  other 
things  a  positively  skewed  distribution  of  scores  for  rate,  a  platykurtic 
distribution  for  vocabulary,  and  a  very  high  mean  grade  score  for  comprehen¬ 
sion.  This  combination  of  adverse  factors  tends  to  make  it  of  little  value  as 
an  instrument  of  diagnosis. 

5.  Despite  the  fact  that  the  Van  Wagenen -Dvorak  Comprehension  sub -tests 
yielded  a  low  mean  grade  score  they  were  regarded  as  superior  to  the  other 
comprehension  sub-tests  for  the  purpose  of  diagnosis. 

6 .  The  Gates  Survey  Test  is  useful  for  the  purpose  of  identifying  the 
general  area  in  which  a  pupil ' s  strengths  or  weaknesses  lie . 

7.  The  distributions  for  rate  generally  yielded  the  largest  standard 
deviations  and  consequently  the  widest  ranges  of  grade  scores. 

8.  The  narrowest  range  of  grade  scores,  3  to  9,  was  yielded  by  both  the 
Gates  Vocabulary  and  Comprehension  sub-tests. 

9.  The  tests  which  were  widest  in  scope  seemed  to  be  of  the  most  value 
as  diagnostic  instruments. 

10.  From  the  results  yielded  by  the  sub-tests  for  alphabetizing  and  use  of 
an  index  in  the  Iowa  Test  it  appears  that  these  two  skills  can  be  taught  within 
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the  range  of  grades  covered  by  this  study. 

11.  The  Gates  and  Van  Wagenen -Dvorak  Rate  sub -tests  proved  to  be  the 
most  satisfactory  instruments  for  measuring  rate  of  reading. 

12.  Only  the  Van  Wagenen-Dvorak  Tests  provide  for  comparison  of  a  sub¬ 
ject's  achievement  in  more  than  one  type  of  vocabulary  exercise. 

13.  The  use  of  separate  answer  blanks  for  the  Van  Wagenen-Dvorak  Compre¬ 
hension  sub-tests  makes  the  task  of  examining  the  responses  made  by  each 
subject  most  difficult. 
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CHAPTER  VI 


CONCLUSION 

THE  FINDINGS 

The  purpose  of  this  study  was  to  evaluate  the  diagnostic  capacities  of 
certain  group  silent  reading  tests  at  the  elementary  school  level.  A  summary 
of  the  more  important  findings  follows: 

1.  The  tests  which  were  widest  in  scope  appeared  to  be  the  most  valid 
diagnostic  instruments. 

2.  Those  for  which  there  were  no  specific  time  limits  imposed  for  the 
vocabulary  and  comprehension  sub -tests  appeared  to  be  better  tests  for  diag¬ 
nostic  purposes  than  those  for  which  specific  time  limits  were  imposed. 

3.  Tests  such  as  the  Gates  Survey  which  are  limited  in  scope  but  which 
possess  a  high  degree  of  validity  proved  to  be  useful  for  identifying  the 
areas  in  which  a  subject's  strengths  and  weaknesses  lie. 

4.  The  use  of  a  period  of  five  minutes  or  more  for  timing  tests  of  read¬ 
ing  rate  proved  to  be  more  satisfactory  than  the  use  of  one -minute  intervals. 

5.  Rate  of  reading  tests  such  as  those  included  in  this  study  are  useful 
chiefly  for  the  purpose  of  comparing  a  subject's  level  of  achievement  in  rate 
with  his  achievement  in  other  phases  of  general  reading  ability. 

6.  Too  much  reliance  should  not  be  placed  on  the  individual  scores 
yielded  by  any  one  sub -test. 

7.  The  value  of  these  tests  in  diagnosing  pupil  difficulties  in  vocabu¬ 
lary  is  limited  because  they  do  not  seek  to  measure  breadth  or  precision  of 
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8.  The  use  of  separate  answer  blanks  makes  the  task  of  examining  the 
actual  responses  made  by  each  subject  most  difficult. 

The  findings  seem  to  justify  the  conclusion  that  the  diagnostic  value  of 
the  reading  tests  included  in  this  study  lies  chiefly  in  their  ability  to  pro¬ 
vide  information  about  the  level  of  the  subject’s  achievement  in  rate,  vocabu¬ 
lary  and  comprehension. 

RECOMMENDATIONS  FOR  THE  USE  OF  THE  TESTS 

The  following  recommendations  apply  only  to  the  use  of  the  five  tests 
included  in  this  study  for  diagnostic  purposes : 

1.  The  investigator  recommends  that  the  Gates  Survey  Tests  be  used  to 
identify  the  area  in  which  a  subject  is  most  retarded. 

2.  If  a  test  of  rate  only  is  desired  the  Gates  or  the  Van  Wagenen-Dvorak 
sub -tests  for  rate  are  recommended. 

3.  The  vocabulary  tests  in  Part  II  of  the  Van  Wagenen-Dvorak  Tests  may 
be  used  to  compare  a  subject’s  ability  to  recognize  words  in  isolation  with 
his  ability  to  recognize  words  in  context. 

4.  Part  III  of  the  Van  Wagenen-Dvorak  Tests  is  recommended  for  use  in 
diagnosing  a  subject’s  ability  in  each  of  several  types  of  comprehension. 

5.  Timed  tests  such  as  the  Sangren-Woody  and  Iowa  Tests  are  not  recommend¬ 
ed  for  use  as  diagnostic  instruments  with  pupils  known  to  be,  or  suspected  of 
being  handicapped  by  a  slow  reading  speed. 
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